Search results for "String metric"

showing 4 items of 4 documents

Indexed Two-Dimensional String Matching

2016

Settore INF/01 - InformaticaTwo-dimensional index data structuresString searching algorithm0102 computer and information sciences02 engineering and technologyApproximate string matching01 natural sciencesCombinatorics010201 computation theory & mathematicsIndex data structures for matrices or imageIndexing for matrices or image0202 electrical engineering electronic engineering information engineeringTwo-dimensional indexing for pattern matching020201 artificial intelligence & image processingString metricMathematics
researchProduct

Active learning strategies for the deduplication of electronic patient data using classification trees.

2012

Graphical abstractDisplay Omitted Highlights? Active learning for medical record linkage is used on a large data set. ? We compare a simple active learning strategy with a more sophisticated variant. ? The active learning method of Sarawagi and Bhamidipaty (2002) 6] is extended. ? We deliver insights into the variations of the results due to random sampling in the active learning strategies. IntroductionSupervised record linkage methods often require a clerical review to gain informative training data. Active learning means to actively prompt the user to label data with special characteristics in order to minimise the review costs. We conducted an empirical evaluation to investigate whether…

Active learningComputer scienceActive learning (machine learning)Information Storage and RetrievalContext (language use)Health InformaticsSemi-supervised learningMachine learningcomputer.software_genreSet (abstract data type)Artificial IntelligenceBaggingData deduplicationElectronic Health RecordsHumansbusiness.industryString (computer science)Decision TreesOnline machine learningComputer Science ApplicationsData miningArtificial intelligenceMedical Record LinkageString metricbusinesscomputerAlgorithmsJournal of biomedical informatics
researchProduct

Top-k String Similarity Joins

2020

Top-k joins have been extensively studied in relational databases as ranking operations when every object has, among others, at least one ranking attribute. However, the focus has mostly been the case when the join attributes are of primitive data types (e.g., numerical values) and the join predicate is equality. In this work, we consider string objects assigned such ranking attributes or simply scores. Given two collection of string objects and a string similarity measure (e.g., the Edit distance), we introduce the top-k string similarity join () which returns k sufficiently similar pairs of objects with respect to a similarity threshold ϵ, which have the highest combined score computed by…

Theoretical computer scienceSimilarity (network science)Computer scienceString (computer science)JoinsJoin (sigma algebra)Edit distanceString metricAggregate functionRanking (information retrieval)32nd International Conference on Scientific and Statistical Database Management
researchProduct

Combining a context aware neural network with a denoising autoencoder for measuring string similarities

2020

Abstract Measuring similarities between strings is central for many established and fast-growing research areas, including information retrieval, biology, and natural-language processing. The traditional approach to string similarity measurements is to define a metric with respect to a word space that quantifies and sums up the differences between characters in two strings; surprisingly, these metrics have not evolved a great deal over the past few decades. Indeed, the majority of them are still based on making a simple comparison between character and character distributions without considering the words context. This paper proposes a string metric that encompasses similarities between str…

Artificial neural networkProperty (programming)Computer sciencebusiness.industryString (computer science)020206 networking & telecommunicationsContext (language use)02 engineering and technologycomputer.software_genre01 natural sciencesTheoretical Computer ScienceHuman-Computer InteractionCharacter (mathematics)0103 physical sciencesMetric (mathematics)0202 electrical engineering electronic engineering information engineeringArtificial intelligenceString metricbusiness010301 acousticscomputerSoftwareWord (computer architecture)Natural language processingComputer Speech & Language
researchProduct